Skip to content

Conversation

@dbtsai
Copy link
Member

@dbtsai dbtsai commented Dec 2, 2025

What changes were proposed in this pull request?

In recent LZ4 versions, safeDecompressor has become highly optimized and can be as fast, or even sometimes faster, than fasterDecompressor. So it does make sense to switch to safeDecompressor.

Why are the changes needed?

It is recommended to switch to .safeDecompressor(), which is not vulnerable and provides better performance per https://sites.google.com/sonatype.com/vulnerabilities/cve-2025-12183

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the CORE label Dec 2, 2025
@dbtsai dbtsai changed the title [SPARK-54571] [Core] Use LZ4 safeDecompressor [SPARK-54571][CORE] Use LZ4 safeDecompressor Dec 2, 2025
@dbtsai
Copy link
Member Author

dbtsai commented Dec 3, 2025

It's more involving than I thought as LZ4BlockInputStream doesn't take safeDecompressor. I will take a deeper look tomorrow.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, please make the CI happy. Thank you, @dbtsai .

@dbtsai dbtsai changed the title [SPARK-54571][CORE] Use LZ4 safeDecompressor [WIP][SPARK-54571][CORE] Use LZ4 safeDecompressor Dec 3, 2025
LuciferYang
LuciferYang previously approved these changes Dec 3, 2025
@LuciferYang
Copy link
Contributor

It seems that the fix for CVE‐2025‐12183 wasn't implemented until version 1.8.1, but Spark is still using version 1.8.0.

https://github.com/yawkat/lz4-java/releases

@LuciferYang LuciferYang dismissed their stale review December 3, 2025 08:05

Spark still use 1.8.0

@yawkat
Copy link

yawkat commented Dec 3, 2025

Note that LZ4BlockInputStream does not support safeDecompressor in lz4-java 1.8.1. If you upgrade to that version, it will still work and be secure, but performance will be much worse than in 1.8.0.

lz4-java 1.10.0 introduces a new builder for LZ4BlockInputStream that accepts a safeDecompressor.

@dongjoon-hyun dongjoon-hyun marked this pull request as draft December 3, 2025 17:38
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reviewers, just for the record, LZ4 1.10.0 is not published to the Maven Central yet.

Image

@yawkat
Copy link

yawkat commented Dec 3, 2025

It is published, but only under the new group id

@SteNicholas
Copy link
Member

@yawkat, which group id does 1.10.0 publish?

@LuciferYang
Copy link
Contributor

we need change to use

<dependency>
    <groupId>at.yawk.lz4</groupId>
    <artifactId>lz4-java</artifactId>
    <version>1.10.0</version>
</dependency>

https://repo.maven.apache.org/maven2/at/yawk/lz4/lz4-java/

image

@dongjoon-hyun
Copy link
Member

Thank you for the updated info, @yawkat , @LuciferYang .

Could you update this PR, @dbtsai ?

@dongjoon-hyun
Copy link
Member

To all, in order to help this PR, I made an independent PR for dependency upgrade.

@dongjoon-hyun
Copy link
Member

We upgraded to 1.10.0 a few minutes ago. Could you rebase this PR to master branch once more, @dbtsai ?

@yawkat
Copy link

yawkat commented Dec 5, 2025

I recommend you wait a few hours with releasing this. Another (smaller, unrelated) CVE has been found in lz4-java.

@yawkat
Copy link

yawkat commented Dec 5, 2025

CVE-2025-66566 has been published and fixed in 1.10.1. I suggest you move to that version. Though cloudflare seems to be having some trouble that breaks maven central at the moment.

@dongjoon-hyun
Copy link
Member

I recommend you wait a few hours with releasing this. Another (smaller, unrelated) CVE has been found in lz4-java.

Just FYI, we have no intention to hurry this, @yawkat . To be safe, this will be tested in master branch for next few months in Apache Spark 4.2.0 timeframe. Currently, we are voting on Apache Spark 4.1.0, it's too late for this kind of risky dependency change.

That's the main reason why LZ4 1.10.0 PR is only in master branch. Dependency change and code change is only building a ground to test further in various workloads.

@dongjoon-hyun
Copy link
Member

Gentle ping once more, @dbtsai .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants